The increasing amount of multilingual text collections available in different domains makes its automatic processing essential for the development of a given field. However, standard processing techniques based on statistical clues and keyword searches have clear limitations. Instead, we propose a knowledge-based processing pipeline which overcomes most of the limitations of these techniques. This, in turn, enables direct comparison across texts in different languages without the need of translation. In this paper we show the potential of this approach for semantically indexing multilingual text collections in the history domain. In our experiments we used a version of the Bible translated in four different languages, evaluating the precision of our semantic indexing pipeline and showing its reliability on the cross-lingual text retrieval task.

Semantic Indexing of Multilingual Corpora and its Application on the History Domain / Raganato, Alessandro; Jose, Camacho-Collados; Raganato, Antonio; Joung, Yunseo. - (2016), pp. 140-147. (Intervento presentato al convegno LT for DH: Language Technology Resources and Tools for Digital Humanities tenutosi a Osaka; Japan).

Semantic Indexing of Multilingual Corpora and its Application on the History Domain

Raganato Alessandro;Camacho-Collados Jose;Raganato Antonio;
2016

Abstract

The increasing amount of multilingual text collections available in different domains makes its automatic processing essential for the development of a given field. However, standard processing techniques based on statistical clues and keyword searches have clear limitations. Instead, we propose a knowledge-based processing pipeline which overcomes most of the limitations of these techniques. This, in turn, enables direct comparison across texts in different languages without the need of translation. In this paper we show the potential of this approach for semantically indexing multilingual text collections in the history domain. In our experiments we used a version of the Bible translated in four different languages, evaluating the precision of our semantic indexing pipeline and showing its reliability on the cross-lingual text retrieval task.
2016
LT for DH: Language Technology Resources and Tools for Digital Humanities
multilinguality; semantic indexing; cross-lingual text retrieval
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Semantic Indexing of Multilingual Corpora and its Application on the History Domain / Raganato, Alessandro; Jose, Camacho-Collados; Raganato, Antonio; Joung, Yunseo. - (2016), pp. 140-147. (Intervento presentato al convegno LT for DH: Language Technology Resources and Tools for Digital Humanities tenutosi a Osaka; Japan).
File allegati a questo prodotto
File Dimensione Formato  
Ragnato_Semantic-indexing_2016.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 648.54 kB
Formato Adobe PDF
648.54 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553729
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact